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THE  PREDICTION  OF  ATTRITION  FROM  MILITARY  SERVICE 
INTRODUCTION 

Maty  situations  arise  where  Individuals  must  be  classified  Into  some  category  on 
the  basis  of  observed  characteristics.  This  classification  problem  Is  faced  dally  by 
college  administrators,  bank  loan  officers,  and  company  employment  managers.  Appli- 
cants have  to  be  classified  as  "successes"  or  "failures"  on  the  basis  of  their  observed 
characteristics.  Thus,  college  applicants  might  be  classified  as  successes  or  failures 
on  the  basis  of  factors  such  as  SAT  scores  and  high  school  ranking,  loan  applicants  on  the 
basis  of  Income  or  net  worth,  and  job  applicants  on  the  basts  of  past  training  and  experience 

Beginning  with  the  seminal  work  of  Fisher  (reference  1),  the  classification  problem 
has  been  studied  Intensively  In  the  statistics  literature.  The  approaches  to  the  classifies  - 
tlon  problem  may  be  separated  Into  two  general  classes,  those  based  on  a linear  probabil- 
ity model,  and  those  based  on  some  non-linear  probability  distribution  such  as  the  logistic 
or  normal.  In  either  approach,  an  equation  for  the  probability  of  being  a "success"  Is  fit 
to  observed  data,  and  the  fitted  equation  Is  used  to  predict  the  success  chances  of  new 
applicants.  Then,  a critical  success  chance,  Or  qualifying  score  is  picked.  New  applicants 
whose  predicted  chances  equal  or  exceed  this  qualifying  score  are  classified  as  successes, 
while  those  whose  predicted  chances  are  lower  are  classified  as  failures.  The  optimal 
score  for  distinguishing  between  successes  and  failures  depends  upon  the  expected  cost  of 
mlsclassifylng  new  applicants. 

Despite  extensive  discussions  of  the  relative  efficiency  of  linear  and  non-llnear  models 
In  the  theoretical  literature  on  classification  (e.g.,  references  2 and  3),  we  have  not  found 
a detailed  applied  comparison  of  them . The  purpose  of  this  research  contribution  Is  to 
make  such  a comparison  of  these  models  when  they  are  estimated  with  very  large  samples 
and  used  to  classify  other  large  cohorts  of  people.  * 

This  work  Is  an  outgrowth  of  a study  on  attrition  of  first-term  enlisted  personnel  from 
the  U.  S.  Navy.  With  the  advent  of  the  all -volunteer  force  and  higher  pay  scales  for  en- 
listed personnel,  attrition  (personnel  leaving  the  Navy  before  completion  of  their  first  en- 
listment) Is  becoming  more  and  more  costly,  and  the  Navy,  as  well  as  the  other  services. 

Is  under  considerable  pressure  from  Congress  to  reduce  It.  In  the  process  of  estimating 
equations  for  attrition  probabilities  that  could  be  used  for  screening  applicants  with  high 
chances  of  attrition,  we  had  to  answer  the  question  of  which  empirical  method  gave  the  best 
discrimination  between  a Writers  and  non-attrlters. 


Nerlove  and  Press  (reference  2)  do  provide  an  empirical  application  of  these  models, 
but  they  were  concerned  primarily  with  estimating  probabilities  rather  than  classification. 
Also,  their  work  Is  based  on  fairly  small  samples  relative  to  the  ones  Mtillzed  here. 
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Two  linear  probability  models  are  compared  with  two  non-linear  probability  models. 
The  two  linear  models  are  the  individual  linear  and  grouped  linear  probability  models, 
respectively.  The  two  non-llnear  models,  which  are  based  on  the  logistic  distribution, 
are  the  Individual  logit  and  the  grouped  logit  models,  respectively.  The  Individual  linear, 
grouped  linear,  and  grouped  logit  models  are  all  estimated  by  ordinary  least  squares 
(OLS)  or  generalized  least  squares  (GLS)  while  the  Individual  logit  model  Is  estimated  by 
the  method  of  maximum  likelihood. 

These  four  models  are  reviewed  in  detail.  Theoretical  reasons  lor  expecting  that  the 
logit  models  will  provide  a better  fit  to  the  data  are  noted.  Four  models  of  first  year 
attrition  are  estimated  with  a sample  of  30, 000  individuals  from  the  cohort  of  67, 000  non- 
prlor  service  males  who  enlisted  in  the  U.S.  Navy  In  CY  1973.  Next,  the  ability  of  the 
fitted  equations  to  discriminate  between  the  attrlters  and  the  non-attrlters  in  a separate 
sample  of  30, 000  Individuals  from  the  CY  1973  cohort  Is  analyzed.  In  addition,  we  analyze 
the  ability  of  grouped  linear  and  grouped  logit  equations  fit  with  all  of  the  data  from  the 
CY  1973  cohort  to  discriminate  between  attrlters  and  non-attrlters  In  the  CY  1974  cohort 
of  r on -prior  service  male  enlistees.  Finally,  we  examine  the  question  of  which  CY  1973 
equation  gives  the  better  prediction  of  attrition  rates  In  the  CY  1974  cohort. 

METHODOLOGIES  FOR  PREDICTING  THE  PROBABILITY  OP  ATTRITION 

This  section  discusses  the  existing  methodologies  for  estimating  attrition  probabil- 
ities and  discriminating  between  attrlters  and  non-attrlters.  It  begins  with  a review  of 
the  individual  and  grouped  linear  probability  models.  The  equivalence  between  the  In- 
dividual linear  probability  model  and  the  linear  discriminant  ftinctlon  Is  noted.  Then,  the 
two  logit  models  are  discussed.  Both  are  consistent  and  asymptotically  efficient  and  should, 
therefore,  yield  similar  parameter  estimates  in  large  samples.  This  Is  an  important 
point,  since  the  estimation  of  the  Individual  logit  model  is  considerably  more  expensive  In 
large  samples.  Theoretical  reasons  for  believing  that  the  logit  models  will  provide  a 
better  fit  to  the  data  than  the  linear  models  are  examined. 

Linear  Models 

To  begin  with,  let  X ■ (X^ . . . X^  be  a 1 x k vector  of  variables  which  determine  the 

probability  that  an  Individual  will  be  an  attrlter.  Then  p(A(  X)  Is  the  conditional  probability 
that  the  Individual  will  attrlte  given  X.  The  problem  Is  to  estimate  the  relationship  between 
p(A|  X)  and  X.  One  way  to  do  this  Is  to  assume  a simple  linear  relationship  between  p(A|  X) 
and  X: 

p(A|X)  - 


rB.. 


X 0 where  0 * 


vJ 


a> 
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Equation  (1)  Is  called  the  linear  probability  model.  The  parameters  in  the  linear  prob- 
ability model  can  be  estimated  two  ways. 


The  Indlvldu* . Linear  Probability  Model 


The  individual  linear  probability  model  is  estimated  by  assigning  a value  of  1 to 
attrlters  and  0 to  non-attrlters.  This  binary  dependent  variable  is  then  regressed  on . 
X.  Formally,  die  model  to  be  estimated  is  given  in  (2): 


Y - X8  + c where  Y + 


_1_ 

0 

L; 


(2) 


Y is  an  n x 1 vector  of  observations  which  may  be  partitioned  into  an  n^  x 1 vector  of  ones 
representing  the  n^  attrlters  in  the  sample,  and  an  n^  x 1 vector  of  zeros  representing 
die  n2  non-attrlters.  X in  (2)  is  an  n x K matrix  of  observations  on  the  independent 
variables.  The  well-known  OLS  estimator  of  (2)  Is  shown  in  (3): 

0 - (X'X)-1X’Y  . (3) 

After  computing  § , the  probability  that  an  Individual  with  set  of  characteristics  Xj  will 
attrite  is  pt  ■ Xj  $ . 

The  linear  model  ts  appealing  because  of  the  computational  ease  of  OLS  and  because 
of  the  ability  of  OLS  to  handle  very  large  samples.  On  the  other  hand,  it  has  been  sub- 
ject to  criticism  in  the  literature.  A major  criticism  is  that  the  individual  linear  model 
violates  the  constam  variance  assumption  of  OLS.  The  error  term  in  (2)  is  binominal  --  it 
can  take  on  die  value  -X8  or  the  value  1X8.  For  the  ith  observation,  the  variance  of 

the  error  term  * is  X.0  (1  -X  8 ).  Since  the  error  term  is  heteroskedastlc,  the  OLS 
ill  j 

estimator  of  B will  not  be  the  minimum  variance  linear  estimator. 

Goldberger  (reference  4)  suggests  the  following  solution  to  this  problem . First,  (2) 
is  estimated  by  OLS  and  the  weight  w - ^X^  (1-X^)  is  computed  for  each  individual 

1See  Goldberger  (reference  4)  for  a diitcussion  of  the  problem  of  heterosfcedasticity. 
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in  the  sample.  Then,  each  y and  Xj  is  weighted  by  1/w^  and  Y{/Wj  is  regressed 
on  Xj/Wj.  This  weighting  procedure  yields  a model  with  a constant  error  variance,  and 
the  regression  of  Yj/Wj  on  X^/w^  gives  the  generalized  least  squares  (GLS)  estimator  of 
2 , which  is  the  minimum  variance  unbiased  estimator  (reference  4,  p.  250). 

A second  criticism  of  the  linear  probability  model  is  that  it  does  not  restrict  p 
to  He  within  the  unit  interval,  although  a p outside  of  this  Interval  could  not  be  inter- 
preted as  a probability  estimate.  In  addition,  Goldbsrger's  procedure  for  correction 
for  heteroskedasticity  Is  invalidated  when  predictions  outside  of  the  unit  interval  are  ob- 
tained. While  the  problem  of  prediction  outside  of  the  unit  Interval  should  diminish  as  the 
sample  size  increases,  * we  still  encountered  it  in  the  empirical  work  reported  below  with 
a sample  of  30, 000  observations.  Nerlove  and  Press  (reference  2,pp.  54-55)  discuss 
some  work  by  Smith  and  Cicchetti  (reference  5)  on  methods  for  handling  Inadmissible 
weights  obtained  In  the  Goldberger  procedure.  We  adopted  the  one  that  uses  .02  as  the 
estimate  of  p for  the  cases  where  p was  less  than  zero.  While  this  procedure  can  be 
applied  to  get  around  the  problem  of  negative  weights  in  the  GLS  estimation  of  S , the 
problem  of  interpreting  the  resulting  equation  as  a probability  model  still  remains. 

A third  criticism  of  the  individual  linear  probability  model  is  not  so  serious  as  it 
first  appears.  It  Is  often  stated  that  since  the  error  term  in  (2)  is  not  normally  distribu- 
ted, tests  of  significance  are  not  exact  tests.  Ladd  (reference  6)  shows  that  despite  the 
binary  form  of  the  dependent  variable  in  (2),  the  usual  tests  of  significance  are  exact 
tests. 


The  (unwetghted)  individual  linear  model  is  proportional  to  the  linear  discriminant 
function  (LDF)  first  proposed  by  Fisher  (reference  1)  in  1936  as  a means  of  identifying 
binary  group  membership.  The  goal  of  LDF  is  to  derive  some  linear  combination  of 
known  characteristics,  say  Z * X'  X,  from  known  data,  and  then  use  this  linear  combina- 
tion to  identify  the  group  to  which  a new  applicant  belongs.  For  the  1th  new  applicant, 
if  Zj  ■ X’  Xt  is  less  than  some  critical  value  of  Z,  say  ZQ,  the  individual  is  classified 

as  a member  of  group  1 (say,  attriter).  Otherwise  he  is  classified  as  a member  of  group 
2 (say,  non -attriter). 


Beginning  with  the  assumption  that  X values  are  distributed  multivariate  normal 
with  mean  vector  u and  variance-covariance  matrix  E * the  "best"  LDF  coefficients 
are  those  which  maximize  0 in  equation  (4): 


0 


V Z X 


^Nerlove  and  Press  (reference  2)  note  that  extreme  sensitivity  of  OLS  estimator  of  8 to 
the  sample  in  small  samples. 
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In  (4),  u A Is  the  vector  of  means  of  the  X values  for  the  attrlters  and  u Is  the 

vector  of  means  of  X values  for  the  non -attrlters.  Thus,  the  X vector  Is  t nosen  such 
that  the  ratio  of  the  squared  difference  between  the  means  of  the  two  groups,  XV  . and 

XV  ^ , to  their  variance,  X'EX  , is  maximized.  The  X vector  that  maximizes  (4)  is 

given  in  (5): 


K ' UHA> 


(3) 


The  mean  vectors  u4  and  ux, . and  the  variance -covariance  matrix 
A NA 

However,  X can  be  estimated  by  using  the  sample  averages  X.  and 

A 

of  u . and  u , . end  the  sample  variances  and  covariances  of  the  X 
A NA 

E . Thu3,  X is  estimated  by  (6): 


E are  unobservable. 

as  estimates 
values  to  estimate 


5 ‘ S‘‘  tXA  ' V • <#> 

Ladd  (reference  6)  has  shown  that  the  ^ vector  obtained  from  (6)  is  directly  pro- 
portional to  the  regression  coefficient  vector  S obtained  from  (3).  This  relationshio  is 
shown  in  (7) : 

*•*«&*,  • .» 

ESS  in  (7)  is  the  enror  sum  of  squares  from  the  linear  regression  (3).  Thus,  using  a 
linear  discriminant  function  to  assign  individuals  to  group  1 (say,  attriters)  or  group  2 
(say,  non -attrlters)  with  a cutting  score  of  Zq  is  equivalent  to  assignment  on  the  basis 

ESS 

of  the  linear  probability  model  with  a cutting  score  of  pQ  * (~-y)  Zq  . ■ 

The  LDP  procedure  is  not  subject  to  quite  the  same  criticisms  as  die  individual  linear 
probability  model,  even  though  the  parameter  estimates  from  the  two  procedures  are  pro- 
portional to  one  another.  Since  the  fitted  LDF  is  not  used  to  predict  probabilities,  but 
only  for  classification,  it  is  not  subject  to  the  criticism  that  it  gives  predicted  probabilities 
outside  of  the  unit  Interval.  In  addition,  there  is  no  problem  of  heteroskedastlcity  since 
the  estimation  procedure  is  not  based  on  the  assumption  of  0LS  that  the  error  term  is 
normally  distributed  with  constant  variance. 
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Grouped  Linear  Probability  Model 


An  alternative  to  the  linear  probability  model  based  on  Individual  observations  Is  the 
grouped  linear  probability  model.  In  this  procedure,  the  observations  are  grouped  Into 
cells  based  on  all  possible  condonations  of  the  Independent  variables.  Grouping  la  easy  if 
all  of  the  Independent  variables  are  categorical  variables  (e.g. , race).  If  some  of  the  vari- 
ables are  continuous  (e.g.,  education  level  or  age),  they  have  to  be  broken  up  into  a (reason- 
ably small)  number  of  Intervals  In  order  to  group  the  data.  The  number  of  cells  Ip  the  pro- 
duct, over  the  number  of  classifiers  (e.g.,  race,  age,  education),  of  the  number  of  intervals 
for  each  classifier.  Thus,  If  there  are  five  classifiers  and  3 Intervals  for  each  classifier, 
there  will  be  3^  ■ 243  cells  Into  which  observations  can  fall. 


Once  the  data  are  grouped,  the  proportion,  p = a /n  , of  n Individuals  In  the  Jth^ 

A J J J 1 

cell  who  were  attrlters  Is  computed,  p^  Is  an  estimate  of  the  true  probability  p^  that  In- 

dividuals^who  fall  into  the  jth.cell  will  attrlte.  To  estimate  the  grouped  linear  probability 
model,  Pj  Is  regreseed  on  dummy  variables  representing  the  different  levels  of  the 


classifiers. 


One  problem  with  a simple  regression  between  p and  X Is  that  the  error  term  In 
A ^ * 

the  regression  (p  - p ) has  a non-constant  variance,  and  hence  the  OLS  estimator  of  8 is 

' * A 

not  a minimum  variance  estimator.  The  variance  of  the  error  term  (Dj  - p^)  Is 


Pj(l  ” an^  18  Inverse*y  r®l*ted  to  ^e  cell  size  n^ 


This  heteroskedastlclty  problem 
Is  handled  by  weighting  each  observation  by  the  Inverse  of  the  estimated  standard  deviation 


error  term 


X 


1 


A „ A 

pj°  • PJ> 


The  weighted  regression ’ etween  p 


M2 


L 


1 p,<i-Pj> 


and 


n 


1 


p/i  - P,) 


gives  the  minimum  variance  unbiased  estimator  of  8 


Since  the  dependent  variable  In  the  grouped  linear  model  Is  actually  a rate  rather  than 
a 1 or  0 criterion  variable,  as  in  the  Individual  linear  model,  the  grouped  linear  model  seems 
more  in  the  spirtt  of  a probability  model.  Since  the  dependent  variable  In  the  grouped  linear 
model  lies  in  the  unit  Interval  cne  would  think  that  the  predictions  with  the  fitted  model  would 
also  be  more  likely  to  lie  in  this  interval  Unfortunately,  we  have  found  with  a large  sample 
that  this  Is  not  necessarily  the  cace. 
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Logistic  Models 

Because  of  their  ease  of  application,  the  linear  probability  models  are  frequently 
employed  in  the  literature,  especially  the  individual  linear  model.  However,  there  are 
reasons  for  suspecting  that  the  linear  probability  model  is  a poor  specification  of  p(A|  X). 
McFadden  (reference  3,  p.  374)  notes  that  the  (weighted)  least  squares  estimator  of  8 
In  (2)  is  very  sensitive  to  specification  error.  Further,  Cox  (reference  7)  and  Day  and 
Kerridge  (reference  8)  show  that,  under  a variety  of  assumptions,  p(A|  X)  is  logistic 
rather  than  linear.  * The  form  of  p(A j X)  for  the  logistic  distribution  is  shown  In  (8). 


(8) 


There  are  several  ways  to  estimate  the  parameter  vector  in  (8).  If  X is  Indeed 
multivariate  normal,  the  best  linear  estimates  of  the  a vector  in  (8)  would  be  the  LDF 
coefficients  in  (6).  This  follows  since  Xa  is  normally  distributed  if  X is  multivariate 
normal,  and  it  was  shown  above  that  the  A vector  in  (7)  is  the  best  linear  unbiased  esti- 
mate of  a when  X is  multivariate  normal. 2 However,  Halperin,  Blackvelder,  and 
Verter  (reference  9)  show  that  if  X is  not  multivariate  normal,  the  LDF  estimator  of  a 
will  not  be  consistent.  Consistent,  asymptotically  efficient  estimates  of  a may  be  obta.ied 
from  either  the  grouped  logit  or  individual  logit  procedures. 

Grouped  Logistic  Model 

With  a large  sample,  a can  be  estimated  using  linear  regression.  The  logistic 
probability  function  in  (8)  can  be  transformed  into  the  following  log-linear  equation  which 
may  be  estimated  with  OLS: 

la(T^J)  = x“*  (9) 


p(A|X) 


Xa 


1+e 


Xa 


where  a 


The  dependent  variable  here  is  the  logarithm  of  the  odds  of  being  an  attriter.  To  estimate 
this  equation,  the  data  are  grouped  into  cells  Just  as  in  the  grouped  linear  model.  Then 
ln(ft  /(l  - ft  ) ) rather  than  p is  used  as  the  dependent  variable  in  the  regression. 


That  p(A|  X)  is  logistic  was  originally  derived  from  the  assumption  that  X is  multi- 
variate normal,  but  the  authors  cited  show  that  p(A|  X)  is  logistic  for  a variety  of  other 
conditions,  including  the  case  where  all  the  independent  variables  are  dichotomous. 


2 

This  Implies  that  better  estimates  of  attrition  probabilities  can  be  obtained  by  plugging 
the  LDF  coefficients  in  (5)  into  (8)  than  by  converting  LDF  parameter  estimates  to  individual 
linear  probability  model  estimates  via  (7)  and  estimating  attrition  probabilities  with  a linear 
equation. 


7 


One  problem  is  that  the  error  term  in  this  regression  has  the  non -constant  variance 

— — . Weighted  regression,  where  eich  observation  is  weighted  by  %n  p (1-p  ), 

njPJ  1 J J J 1 

yields  the  generalized  least  squares  estimator  of  a . This  grouped  logit  procedure,  due 
to  Berkson  (reference  10),  is  known  as  the  minimum  logit  chi-square  method.  Cox 
(reference  6)  shows  that  under  very  general  conditions  this  method  yields  consistent, 
asymptotically  efficient  estimates  of  a . 

Individual  Logistic  Model 


The  logistic  probability  ftmction  in  (8)  is  a non-linear  equation  which  may  also  be 
estimated  by  the  method  of  maximum  likelihood.  Maximum  likelihood  estimation  of  (8) 
was  developed  because  the  grouped  logit  procedure  is  inapplicable  In  small  samples 
where  many  cells  are  empty  or  have  only  a few  observations.  As  Nerlove  and  Press 
(reference  2,  p.  60)  state,  the  maximum  likelihood  procedure  yields  parameter  estimates 
that  have  desirable  small  sample  properties. 

To  estimate  (8)  by  the  method  of  maximum  likelihood,  the  likelihood  ftmction  is  formed, 
and  the  a vector  which  maximizes  the  value  of  the  likelihood  ftmction  is  found.  Since  in- 
dividual observations  are  used,  we  call  this  model  the  individual  logistic  model.  The 
likelihood  ftmction  is: 


y.-  0 


Since  (10)  Is  not  a simple  linear  expression,  the  a vector  has  to  be  estimated  by 
a non-linear,  iterative  technique.  Using  the  Newton -Raphson  technique,  the  a vector 
is  estimated  as  follows.  The  logarithm  of  the  likelihood  ftmction  L is  computed,  and 
then  the  partial  derivative  of  InL  with  respect  to  each  a , (91nL/3  a j),  is  computed. 

Denote  this  k x 1 vector  of  partial  derivatives  by  £ (<$.  This  vector  Is  called  the  "score. " 
The  point  at  which  £ (a  ) * 0 is  called  the  "efficient  score,  " since  the  likelihood  ftmction 
is  maximized  at  this  point.  * 


The  equations  that  make  up  the  efficient  score  are  similar  to  the  normal  equations  in  a 
linear  regression,  but  are  non-linear  and  cannot  be  solved  analytically  as  can  the  normal 
equations  - 


8- 


I 


Next,  the  k x k matrix  of  second  partial  derivatives  of  InL  with  respect  to  a » 

2 

( b lnL/dajdotj) , Is  calculated.  Denote  this  matrix  by  £ ' . The  vector  a is  then  esti- 
mated Iteratively  as  follows:* 

£ - £ . . (11) 
m m *1  m m 

The  m subscript  refers  to  the  mth  Iteration.  [£ ' (a)  3 * Is  the  Inverse  of  £'  (a)  . 

, - m m 

On  each  Iteration,  [i"  (a)  ] and  £ (a)  are  evaluated  with  the  sample  data.  The  best 

m m 

fit  (I.e.,  the  a vector  such  that  £ (a)  * 0)  is  found  when  [i"  (a  > ] 1 £ (a  ) converges 

m m 

to  zero.  The  "start  values"  In  the  Iteration  piocess  are  the  LDF  coefficients  in  (6). 

The  ML  estimate  of  a 18  normally  distributed  with  asymptotic  covariance  matrix 

[£'  (a)m3  * . Thus,  a t-test  of  the  significance  of  Is  a^/S^  where  is  the  square 

root  of  the  1th  diagonal  element  of  [/"(a)  3 * • 

— m 


1 It  was  noted  above  that  £ (a)  * 0 cannot  be  solved  analytically  for  a . However,  equation 

(11)  for  6 is  derived  as  follows.  If  £ (a)  Is  expanded  In  a Taylor  series  around  the 
m 

arbitrarily  selected  point  aQ  » then 

£ (a)  - £ (a^  + (a  - £'  (a^  + 1/2  (a  - aQ)2  £'  '(a  Q)  ... 

2 

Ignoring  1/2  (a -aQ)  £'  ' (aQ)  and  other  higher  order  terms,  setting  £ (a)  equal  to 
zero,  and  solving  for  a , we  find  that, 

a-ao"Cr(ao>3’ljf(6o)  * 

This  equation  gives  a value  for  a by  expanding  t (a)  around  the  arbitrary  point  a . 

A -I  ® 

The  best  fitting  a , am  , Is  found  by  Iterating *va  a until  [f ' (ajl  £ (aQ)  vanishes. 


Finally,  it  la  worthwhile  to  note  the  similarity  between  parameter  estimates  obtained 
from  a model  based  on  the  logit  distribution  and  those  from  a model  based  on  the  normal 
distribution.  Instead  of  assuming  a logit  distribution,  one  could  assume  that  attrition 
probabilities  follow  a normal  distribution  with  unit  variance  (a2  * 1): 


P 


-XYe-l/2t2 

-L  ^ 


(12) 


The  parameters  in  (12)  must  be  estimated  by  maximum  likelihood.  This  model  is  called 
the  probit  model.  While  the  logit  model  in  (9)  and  the  probit  model  in  (12)  look  different, 


their  cumulative  distributions  are  very  similar. 


2 

The  logit  distribution  has  variance  -j-  . 


Therefore,  if  the  a obtained  from  the  logit  estimation  procedure  is  weighted 
It  will  be  virtually  the  same  as  the  y obtained  from  the  probit  estimation. 


The  logit 


and  problt  estimates  differ  only  by  the  scale  factor 


vn . 

TT 


Logit  estimation  is  used 


more  often  than  probit  estimation,  because  the  logit  probability  function  is  closed  form 
(does  not  have  an  integral  that  must  be  evaluated)  and  is  therefore  much  easier  to  estimate. 


EMPIRICAL  RESULTS 


The  four  models  discussed  above  were  applied  to  data  from  the  CY  1973  cohcrt  of 
non -prior  service  male  enlistees.  The  dependent  variable  was  whether  or  not  the  Individual 
was  lost  before  the  end  of  one  year  of  service.  The  independent  variables  were  years  of 
education,  mental  ability  as  measured  by  the  Armed  Forces  Qualification  Test,  marital 
status,  age,  and  race.  Education  wae  split  into  three  categories,  less  than  12  years,  12 
years,  and  more  than  12  years.  Individuals  were  classified  into  five  standard  mental 
groups  (I,  II,  IIIU,  H!L,  and  IV)  oh  the  basis  of  their  AFQT  scores.  Age  was  split  into 
three  categories,  less  than  18  years,  18  or  19  years,  and  greater  than  19  years.  The 
various  combinations  of  education  level,  mental  ability,  age,  race,  and  marital  status 
(3x5x3x2x2)  give  rise  to  180  cells  that  individuals  can  fall  into. 


The  CY  1973  cohort  contained  approximately  ft 7, 000  men.  We  divided  the  first 
60, 000  of  them  into  2 samples  of  30, 000  each  (with  7, 000  left  over)  by  alternatively 
assigning  individuals  to  an  "A"  sample  and  a "B"  sample.  Then,  the  four  models  de- 
scribed above  were  estimated  with  each  sample  of  data.  Splitting  the  cohort  into  samples 
of  30, 000  was  necessary  for  comparing  the  individual  logit  model  with  the  other  models, 
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because  die  maximum  likelihood  computer  program  used  to  estimate  this  model  can 
accommodate  a maximum  of  30, 000  observations.  Even  with  30, 000  observations,  2.5 
hours  of  computer  time  were  required  to  estimate  It. 


After  die  four  models  were  fit  with  each  sample  of  data,  the  ability  of  each  fitted 
equation  to  discriminate  between  the  attriters  and  the  non -attr iters  in  the  other  (cross- 
validation)  sample  was  examined.  On  the  basis  of  qualifying  scores  ranging  from  60  to 
100,  each  Individual  In  the  cross-validation  sample  was  classified  as  an  attrlter  oi  non- 
attriter.  Thus,  If  the  qualifying  score  Is  75,  Individuals  who  have  lower  survival  chances 
are  labeled  attriters  and  Individuals  with  equal  or  higher  scores  are  labeled  non* attriters. 
For  scores  ranging  from  60  to  100,  we  examined:  (1)  the  percentage  of  the  cross-valida- 
tion sample  that  would  be  selected,  (2)  the  "hit"  rate,  or  percent  of  sample  correctly 
Identified  as  either  attriters  or  non-attrlters,  (3)  the  "false  negative"  rate,  or  percent 
of  sample  labeled  as  attriters  who  actually  stayed,  and  (4)  the  "false  positive"  rate,  or 
percent  of  sample  labeled  as  con -attriters  who  actually  left. 

The  Parameter  Estimates 

Table  1 shows  the  parameter  estimates  obtained  by  applying  the  four  procedures 
described  above  to  one  of  the  samples.  Estimates  obtained  with  the  second  sample  are 
contained  In  appendix  A . The  estimates  shown  in  the  column  labeled  "Individual  Linear" 
are  those  obtained  with  the  weighted  regression  procedure  described  In  the  last  section. 1 2 
Table  1 also  shows  the  LDF  coefficients,  which  are  proportional  to  the  unweighted  esti- 
mates (not  shown)  of  the  Individual  linear  probability  model. 

Several  conclusions  are  apparent  from  table  1 . With  the  large  sample  used  here, 
each  of  die  two  grouped  models  gives  virtually  the  same  fitted  equation  as  its  Individual 
counterpart.  Especially  in  the  -ase  of  the  two  linear  models,  the  parameter  estimates 
obtained  with  the  grouped  linear  model  are  In  most  cases  the  same  down  to  the  third  dec- 
imal place  as  those  obtained  with  the  Individual  linear  model.  Tifferences  In  the  predicted 
attrition  probabilities  obtained  with  the  two  linear  equations  are  quite  small.  Although  not 
as  obvious,  the  differences  In  the  estimates  from  the  two  logit  models  also  Imply  trivial 
differences  In  estimated  attrition  probabilities.*  The  parameter  estimates  from  either 


1lt  was  noted  above  that  the  unweighted  estimates  of  the  individual  linear  probability 
model  gave  predicted  attrition  chances  of  less  than  zero  in  some  cases.  This  occurred 
for  Individuals  who  had  more  than  12  years  of  education  and  who  were  In  mental  group  I. 
These  Individuals  made  up  about  2 percent  of  the  sample.  The  problem  of  negative  weights 
In  the  weighted  regres8*on  procedure  was  handled  by  assigning  these  individuals  an  attri- 
tion probability  of  .02  . 

2 

A difference  In  a parameter  estimate  between  die  two  procedures  of  about  . 10  will  Imply 
a difference  In  the  predicted  attrition  probability  of  about  .01  . Most  of  the  differences 
between  the  parameter  estimates  obtained  with  the  two  logit  procedures  are  considerably 
smaller  than  . 1C  . 
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ESTIMATES  OF  PARAMETER  VALUES, 
SAMPLE  FROM  CY  1973  COHORT 
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of  the  two  logit  procedures  differ  from  die  LDP  coefficients  on  precisely  the  variables 
that  have  the  most  Impact  on  the  probability  of  attrition,  the  education  and  mental  group 
variables.  For  most  of  the  ocher  variables,  the  deviations  of  the  logit  coefficients  from 
die  LDP  coefficients  are  small. 

The  close  correspondence  between  the  parameter  estimates  from  the  two  logit 
models  Is  to  be  expected,  since  both  have  been  shown  In  the  theoretical  literature  to  be 
consistent  and  asymptotically  efficient.  The  similarity  of  results  Is  Important,  because 
with  large  samples  the  Individual  logit  model  Is  considerably  more  expensive  to  estimate. 

The  Selection  Ratio  and  Distributions  of  Correct  and  Incorrect  Predictions  for  Linear 
and  Logit  Models 

Figures  1 through  4 show  the  selection  ratio,  hit  rate,  false  positive  rate,  and 
false  negative  rate  for  the  Individual  linear  and  logit  models  for  qualifying  scores  ranging 
from  60  to  100.  While  the  distributions  In  these  figures  are  based  on  the  Individual 
linear  and  logit  models,  the  distributions  obtained  with  the  grouped  models  were  virtually 
die  same.  The  differences  that  exist  between  models  are  between  the  linear  models  and 
the  logit  models,  not  between  the  two  versions  of  the  same  model. 

Looking  first  at  figure  1,  at  a qualifying  score  of  60,  all  of  the  cross-validation 
sample  would  be  admitted.  As  the  qualifying  score  Is  raised,  obviously  fewer  people 
are  selected.  The  sizeable  differences  between  the  selection  ratios  Implied  by  the  two 
methods  occurs  In  the  range  of  qualifying  scores  between  74  and  82.  In  this  range,  a 
higher  percentage  of  the  cross-validation  sample  would  be  selected  with  the  logit  model 
than  with  the  linear  model.  The  maximum  difference  between  models  occurs  at  a qualify- 
ing score  of  79,  where  5 percent  more  people  would  be  selected  using  the  logit  model. 

The  hit  rate  distribution  is  shown  In  figure  2.  Again,  die  range  where  sizeable 
differences  In  hit  rates  occur  lies  between  cutting  scores  of  74  and  82.  In  this  range, 
the  logit  model  gives  a somewhat  higher  hit  rate  than  does  the  linear  model.  Again,  the 
maximum  difference  occurs  at  a qualifying  score  of  79.  Here  the  logit  model  gives  a 3 
percent  higher  hit  rate  than  the  linear  model. 

Figures  3 and  4 show  the  rate  of  false  positive  predictions  and  the  rate  of  false 
negative  predictions  for  the  linear  and  logit  models.  As  figure  5 shows  the  rate  of  false 
p ..nltlve  predictions  declines  as  the  qualifying  score  Is  raised,  because  only  those  whose 
survival  chances  exceed  the  higher  qualifying  score  are  selected.  Again,  most  of  the 
differences  between  the  models  occur  In  die  range  74  to  82.  At  the  qualifying  score  of 
79,  the  logit  model  gives  a one  percent  higher  rate  of  false  positives  than  the  linear  model. 
The  higher  rate  of  false  positives  for  die  logit  model  In  the  range  74  to  82  Is  due  to  the 
fact  that  In  this  range,  a higher  percentage  of  the  applicant  cohort  would  be  enlisted  using 
the  logit  model  (recall  figure  1). 
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Looking  at  the  rate  of  false  negative  predictions  In  figure  4,  we  see  again,  that,  the 
differences  between  the  logit  and  linear  models  occur  In  the  range  74  to  82.  Again,  the 
maximum  difference  occurs  at  the  qualifying  score  of  79,  where  the  logit  model  has  a 4 
percent  lower  false  negative  rate  than  the  linear  model. 

To  summarize,  our  results  Indicate  that  Individuals  who  have  very  low  or  very  high 
chances  of  early  attrition  will  be  correctly  classified  by  either  model.  Thus,  for  qualify- 
ing scores  below  74  or  above  82,  the  two  models  give  about  the  same  rate  of  hits,  false 
positives,  and  false  negatives..  However,  In  the  range  between  74  and  82,  the  logit  model 
gives  a higher  rate  of  hits  and  false  negatives,  but  a higher  rate  of  false  positives.  It  Is 
significant  that  this  Is  the  area  of  greatest  overlap  between  attrlters  and  non-attrlters. 
Seventy-eight  Is  the  r.verage  SCREEN  score  of  attrlters,  while  82  is  the  average  score  of 
non -attrlters.  In  the  range  where  overlap  occurs,  the  logit  model  gives  slightly  better 
discrimination  between  attrlters  and  non-attrlters  than  the  Linear  model. 

Grouped  logit  and  grouped  linear  equations  fit  to  the  whole  CY  1973  cohort  are  found 
In  Lockman  (reference  11).  These  equations  are  reproduced  In  appendix  3.  The  Navy  Is 
now  using  tables  based  pn  the  grouped  logit  model  to  screen  recruits,  so  we  wanted  to  de- 
termine how  well  these  two  models  distinguish  between  attrlters  and  non-attrlters  In  an- 
other cohort.  Therefore,  these  equations  were  applied  to  the  CY  1974  cohort.  The  selec- 
tion ratio,  the  hit  rate,  false  positive  rate,  and  false  negative  rate  distributions  are  shown 
in  figures  5, 6,  7,  and  8,  respectively.  Although  the  patterns  are  similar  to  the  ones  shown 
previously,  the  differences  between  the  logit  and  linear  models  are  much  less  pronounced. 
Whereas  we  found  virtually  no  differences  in  the  lower  tails  of  the  distributions  In  figures 
1 through  4 above,  we  do  find  some  differences  In  figures  5 through  8. 

Prediction  of  Attrition  Rates  with  Linear  and  Logit  Models 

In  addition  to  using  the  linear  or  logit  models  for  classification,  we  are  also  Interested 
In  Just  how  well  they  predict  fliture  attrition  rates.  Even  If  the  models  are  not  used  for  re- 
cruit screening  purposes,  they  could  still  be  used  to  predict  the  attrition  that  will  be  suffered. 
As  noted  above,  theory  tells  us  that  the  logit  model  Is  a better  specification  of  P(A|  X)  tlian 
the  linear  model.  If  so,  the  logit  model  should  have  smaller  errors  In  predicting  ftiture 
attrition  rates  than  the  linear  model. 

To  see  If  this  Is  true,  we  predicted  the  attrition  rates  for  the  137  cells  In  the  CY  1974 
cohort  which  contained  observations  from  grouped  linear  and  grouped  logit  equations  based 
on  all  of  the  data  from  the  CY  1973  cohort.  We  computed  two  values  reflecting ^he  predic- 
tive ability  of  the  two  equations.  The  first  is  an  error  sum  of  squares,  E (Pj  - p^)2  . The 

second  Is  an  error  sum  of  squares  which  weights  the  square  of  the  error  by  the  number  of 
observations  In  the  cell,  E N^(p^  - ^)2  . This  statistic  should  provide  a better  comparison 

of  prediction  errors  for  two  reasons.  First,  It  weights  each  error  by  the  "cost"  of  the  error; 
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prediction  errors  are  more  expensive  the  more  Individuals  there  are  In  a cell.  Second, 

It  could  be  that  the  larger  errors  are  occurring  In  the  upper  or  lower  tails  of  the  proba- 
bility distribution  where  the  cell  sizes  are  small.  The  linear  model  might  be  predicting 
as  well  In  the  middle  of  the  distribution  (say,  where  the  attrition  probabilities  lie  between 
.1  and  .31  but  doing  poorly  in  the  tails.  If  this  Is  so,  the  difference  between  the  models  In 
£ Nj(Pj  - Pj)2  should  be  smaller  than  the  differences  In  £ (p^  - p^)2  . 

Table  2 presents  the  results  of  these  computations.  Based  on  either  measure,  the 
logit  model  Is  found  to  give  smaller  prediction  errors  than  the  linear  model.  It  Is  some- 
what surprising  that  there  Is  a larger  (percentage)  difference  In  the  weighted  error  sum 
of  squares  between  the  models  than  In  the  unweighted  error  sum  of  squares.  Differences 
between  methods  were  not  Just  due  to  the  linear  model  having  larger  prediction  errors  In 
small  cells  or  cells  at  the  extremes  of  the  probability  distribution.  These  results  Indicate 
that  the  logit  specification  of  P(A  | X)  was  a better  predictor  than  the  linear  specification 
of  P(A|  X). 


TABLE  2 

TWO  MEASURES  OF  PREDICTION  ERROR,  CY  1973  EQUATIONS 
APPLIED  TO  CY  1974  COHORT 

Grouped  logit  Grouped  linear 

E (Pj  " Pj)2  4*08  4.38 

EPyPj-Pj)2  165.15  205.76 

CONCLUSIONS 

Several  general  conclusions  emerge  from  our  empirical  analysis.  First,  with  large 
samples,  the  Individual  linear  and  logl*  models  give  virtually  the  same  fitted  equation  as 
their  grouped  counterpart.  This  Is  essentially  an  empirical  demonstration  of  the  fact  that 
each  individual  model  has  the  same  asymptotic  properties  as  Its  grouped  counterpart.  Know 
tng  that  the  grouped  logit  model  based  on  linear  regression  and  the  Individual  logit  model 
based  on  maximum  likelihood  yield  the  same  fitted  equation  is  extremely  usef il,  because 
maximum  likelihood  estimation  Is  computationally  expensive  in  very  large  samples. 

Second,  the  logit  models  are  found  to  be  superior  to  the  linear  models  on  teveral 
counts.  For  a range  of  qualifying  scores  most  likely  to  be  used  by  the  Navy  to  separate 
acceptable  from  unacceptable  applicants,  the  logit  models  give  somewhat  better  prediction 
of  actual  success  or  failure.  They  are  also  found  to  give  a lower  rate  of  "false  negatives" 


(predicted  failures  who  are  actual  successes) . However,  they  do  give  a slightly  higher 
rate  of  "false  positives"  (predicted  successes  who  are  actual  failures). 

Third,  the  grouped  lopit  model  based  on  data  from  one  cohort  (CY  1973  enlistees) 
was  found  to  give  better  estimates  of  attrition  rates  of  different  groups  in  other  cohort 
(CY  1974  enlistees)  than  the  grouped  linear  model  based  on  the  same  data.  Goodness  of 
fit  was  measured  by  both  weighted  and  unweighted  error  sums  of  squares  in  prediction. 
Consequently,  the  grouped  logit  model  is  the  best  model  for  the  prediction  of  attrition 
with  very  large  samples. 


“20" 


REFERENCES 


1.  Fisher,  R.A.,  The  Use  of  Multiple  Measurement  in  Taxonomic  Problems. 

Annals  of  Eugenics,  Vol.  7,  1936,  pp.  179-188. 

2.  Nerlove,  M.  and  J.S.  Press,  Univariate  and  Multivariate  Log-Linear  and  Logistic 
Models,  Report  R-1306-EDA/NIH,  The  RAND  Corporation,  Dec.  1973. 

3.  McFadden,  D.,  Quantal  Choice  Analysts;  A Survey,  Annals  of  Economic  and  Social 
Measurement,  Vol.  7,  Fall  1976,  pp.  363-390. 

4.  Goldberger,  A.,  Economic  Theory,  John  Wiley  and  Sons,  New  York,  1964. 

5.  Smith,  V.K.,  and  C.J.  Clcchettl,  "Estimation  of  Linear  Probability  Models  with 
Dichotomous  Dependent  Variables,  " Resources  for  the  Future  1972  (memo). 

6.  Ladd,  B.,  "Linear  Probability  Functions  and  Discriminant  Functions,  " Econometrics, 
Vol.  34,  1966,  pp.  873-885. 

7.  Cox,  D.,  Analysis  of  Binary  Data,  Methuen,  London,  1970. 

8.  Day,  N.E.,  andD.F.  Kerrldge,  "A  General  Maximum  Likelihood  Discriminant, " 
Biometrics,  Vol.  23,  1957,  pp.  313-323. 

9.  Halperin,  M.,  Blackwelder,  W.D.,  andj.l.  Verter,  "Estimation  of  the  Multivariate 
Logistic  Risk  Function:  A Comparison  of  the  Discriminant  and  Maximum  Likelihood 
Approaches, " Journal  of  Chronic  Diseases,  Vol.  24,  1971,  pp.  125-158. 

10.  Berkson,  J.,  "Maximum  Likelihood  and  Minimum  Chi-Square  Estimates  of  the 
Logistic  Function,  " journal  of  the  American  Statistical  Association,  Vol.  50, 

March  1955,  pp.  130-162. 

11.  Center  for  Naval  Analyses,  Study  1068,  "Chances  of  Surviving  the  First  Year  of 
Service:  A Ne»r  Technique  for  Use  in  Making  Recruiting  Policy  and  Screening 
Applicants  for  the  Navy,  " by  R.F.  Lockman,  Unclassified,  November  1975. 


21- 


jKHMI; 


iipPMIMBWWPg 


i 


APPENDIX  A 


ESTIMATES  OF  PARAMETER  VALUES, 
SAMPLE  B FROM  CY  1973  COHORT 


APPENDIX  A 


ESTIMATES  OP  PARAMETER  VALUES, 
SAMPLE  B FROM  CY  1973  COHORT 


Variable 

Individual 

linear 

Grouped 

linear 

Individual 

logit 

Grouped 

logit 

Linear 

discriminant 

ftinction 

Bd  < 13 

-.112 

a8.pi) 

-.117 

(15.78) 

-.727 

(22.99)* 

-.713 

(16.45) 

-.849 

(20.92) 

Ed  >12 

;oi7 
( 2.32) 

.025 
( 2.88) 

.252 
( 3.25) 

.201 
( 2.13) 

.181 
( 2.74) 

Mental  Group  I 

.060 
( 7.79) 

.064 
( 7.20) 

.848 
( 6.70) 

.752 
( 5.04) 

.460 
( 5.50) 

Mental  Group  n 

.017 
( 3.34) 

.018 
< 2.90) 

.199 
( 4.47) 

.197 
( 3.62) 

.140 
( 3.32) 

Mental  Group  IHL 

-.075 

(10.64) 

-.075 
( 9.08) 

-.484 

ao.ss) 

-.480 
( 8.85) 

-.551 
( 4.37) 

Mental  Group  IV 

-.110 

(13.95) 

-.111 

(12.11) 

-.652 

(14.58) 

-.642 

(11.56) 

-.794 

(15.38) 

Marital  status  (married) 

-.061 

-.042 

-.435  * 

-.487 

.-.470 

( 6.21) 

(3.94) 

( 6.87) 

( 6.67) 

( 7.10) 

Age  < 18 

-.026 
( 3.43) 

-.014 
( 1.55) 

-.078 
( 1.75) 

-.104 
( 2.05) 

-.083 
( 1.67) 

Ago  19 

-.017 
( 3.18) 

-.020 
( 3.28) 

-.148. 

( 3.31) 

-.135 
( 2.70) 

-.134 
( 3.25) 

Race  (Non-Caucasian) 

.013 
( 1.67) 

.022 
( 2.48) 

.075 
( 1.68) 

.035 
( .67) 

.113 
( 2.10) 

Constant 

.890 

(24.20) 

.890 

(20.74) 

2.017 

2.101 

(43.21) 

2.141 

(22.16) 

N 

30,000 

131 

30,000 

131 

30.000 

*"t"  values  In  parentheses. 
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APPENDIX  B 


PARAMETER  ESTIMATES  FOR  GROUPED  LOGIT  AND  GROUPED  LINEAR  MODELS, 

BASED  ON  TOTAL  CY  1973  COHORT 
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PARAMETER  ESTIMATES  FOR  GROUPED  LOGIT  AND  GROUPED  UNEAR  MODELS, 

BASED  ON  TOTAL  CY  1973  COHORT 


Variable 

Grouped  logit 

Grouped  linear 

Ed  < 12 

-.701  , 

-.111 

(21.20) 

(19.03) 

Ed  > 12 

.314 

.031 

( 4.42) 

(4.49) 

Mental  Group  I 

.989 

.079 

( 8.37) 

(10.85) 

Mental  Group  II 

.254 

.026 

( 6.22) 

( 5.28) 

Mental  Group  IIIL 

-.365 

-.052 

( 8.85) 

< 7.91) 

Mental  Group  IV 

-.597 

-.100 

(14.23) 

(13.44) 

Marital  status  (married) 

-.389 

-.038 

( 6.95) 

( 4.36) 

Age  < 18 

-.093 

-.015 

( 2.76) 

( 2.89) 

Age>  19 

-.280 

-.032 

( 6.43) 

( 5.43) 

Race  (Non -Caucasian) 

.119 

.034 

( 2.64) 

( 4.89) 

Constant 

1.976 

.882 

(57.35) 

(26.89) 

N 

137 

137 

*"t"  values  In  parentheses. 
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